Incrementally building FAIR Digital Objects with Specimen Data Refinery workflows

نویسندگان

چکیده

Specimen Data Refinery (SDR) is a developing platform for automating transcription of specimens from natural history collections (Hardisty et al. 2022). SDR based on computational workflows and digital twins using FAIR Digital Objects. We show our recent experiences with building the Galaxy workflow system combining two FDO methodologies open (openDS) RO-Crate data packaging. suggest improvements incremental objects in workflows. realised as (Afgan 2018) tools installed. An Open Research challenge that some have machine learning models commercial licence. This complicates publishing to toolshed, however we created Ansible scripts install equivalent servers, including dependencies, accounts are published WorkflowHub FDOs. implemented use case De novo digitization (Brack Shown Fig. 1 steps exchange openDS JSON 2019), completion specimen. Initial stages build template CSV metadata image references – subsequent analysis completes rest regions interest, text digitised handwriting, recognized named entities . can visualise outputs each step (Fig. 2), important make FDOs understandable by domain experts verify accuracy SDR. adding partial stages, e.g. detection (Livermore Woolland 2022a) hand-written recognition 2022b), which we'll combine scalability testing wider project users. Additional will enhance existing new such barcode museums’ internal identifiers. now ready publish Objects, registration into DiSSCO repositories, PID assignment provenance. However, even at this early stage identified several challenges need be addressed. lessons highlight because exchanging not fully completed yet assigned persistent schemas still development, therefore uses more flexible schema where only initial (populated CSV) required. Each validates before passing it underlying command line tool. Although objects, they cannot combined any order. For instance, entity requires FDO. consider these intermediate sub-profiles an Type. Unlike hierarchical subclasses, profiles like ducktyping. instance may require key, but semantically there no requirement OpenDSWithText subclass OpenDSWithRegion , also transcribed manually without regions. Similarly, found executed parallel, merging achieved queries Schemas, indicates beneficial fragments separate objects. Adding fragment would complicate Several process referenced images, currently https URLs openDS. added caching layer avoid repeated downloading, coupled local file-paths wiring workflow. A similar occurs if accessing DOIP, unlike HTTP, has mechanisms. support importing exporting Workflow Run Crates, profile (Soiland-Reyes 2022b) captures execution workflow, its definition (De Geest adopting provenance, envisioned Walton (2020). Our prototype de returns results ZIP file End-users should get copies images generated visualisations, along metadata. investigating ways embed preliminary final step, so result enriched RO-Crate. Conclusions example machine-assisted construction FDOs, needs compliant. The “local FDOs” just efficiency visual inspection, simplify composition canonical blocks. At same time see insufficient pass other re-downloaded. Further work investigate wrapper types profiles, order restrict “impossible” ordering depending particular inner fragments. distinction made between “draft” state those pushed DiSSCo registries. experimenting changing components Canonical Building Blocks Common Language (Crusoe gives flexibility scalably execute different compute backends HPC or cluster, additional setup servers.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fair Objects

The temporal logic of actions (TLA) provides operators to express liveness requirements in an abstract speciication model. TLA does not, however, provide high level composition mechanisms which are essential for synthesising and analysing complex behaviour. Contrastingly, the object oriented paradigm has proven itself in the development of structured speciications. However, most, if not all, of...

متن کامل

Learning Concepts Incrementally With Bounded Data Mining

Important re nements of incremental concept learning from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every in nite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning machine computes a sequence of hypotheses about the target concept from a positive presentation of i...

متن کامل

MVSink: Incrementally Building In-Network Aggregation Trees

In-network data aggregation is widely recognized as an acceptable means to reduce the amount of transmitted data without adversely affecting the quality of the results. To date, most aggregation protocols assume that data from localized regions is correlated, thus they tend to identify aggregation points within these regions. Our work, instead, targets systems where the data sources are largely...

متن کامل

Building Compact N-gram Language Models Incrementally

In traditional n-gram language modeling, we collect the statistics for all n-grams observed in the training set up to a certain order. The model can then be pruned down to a more compact size with some loss in modeling accuracy. One of the more principled methods for pruning the model is the entropy-based pruning proposed by Stolcke (1998). In this paper, we present an algorithm for incremental...

متن کامل

Building Symbolic Objects from Data Streams

With the increase of computer use in all sectors of activity, more and more data are available as streams of structured records so that it is not possible to store all data before analyzing them in a data mining perspective. New data management systems have been studied to handle such data streams and new algorithms have been developed to perform stream mining. In this paper, we propose approac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Research Ideas and Outcomes

سال: 2022

ISSN: ['2367-7163']

DOI: https://doi.org/10.3897/rio.8.e94349